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BAYESIAN METHODS IN THE SHAPE INVARIANT 
MODEL (II): IDENTIFI ABILITY AND POSTERIOR 
CONTRACTION RATES ON FUNCTIONAL SPACES 

By Dominique Bontemps* and Sebastien Gadat* 

Institut Mathematiques de Toulouse, Universite Paul Sabatier 

In this paper, we consider tlie so-called Shape Invariant Model 
which stands for the estimation of a function submitted to a ran- 
dom translation of law g'^ in a white noise model. We are interested 
in such a model when the law of the deformations is unknown. We 
aim to recover the law of the process P/o^gO as well as /° and 

We first provide some identifiability result on this model and then 
adopt a Bayesian point of view. In this view, we find some prior on 
/ and g such that the posterior distribution concentrates around the 
functions /" and 3" when n goes to -|-cxj, we then obtain a contraction 
rate of order a power of log(n)^^. We also obtain a lower bound on 
the model for the estimation of /" and g^ in a frequentist paradigm 
which also decreases following a power of log(7i)~^. 



1. Introduction. We are interested in this worlc in the so-called Shape 
Invariant Model (SIM). Such model aims to describe a statistical process 
which involves a deformation of a functional shape according to some ran- 
domized geometric variability. Such a model possesses various applications 
in biology, genetic, imaging science, econometry (one should refer to [BG13] 
for a more detailed list of possible applications and references). 

In the mathematical community, it has also received a large interest as 
pointed by the numerous references on this subject (see also [BG13]), and 
various methods have been developed: M-estimation, multi-resolution and 
harmonic analysis, geometry or semi-parametric statistics. In our study, we 
consider the general case of an unknown shape submitted to a randomized 
deformation whose law is also unknown. We adopt here a Bayesian point 
of view and want to extend the results obtained on the probability laws in 
[BG13] to the functional elements which parametrize the Shape Invariant 
Model. Hence, starting from the strategy used in [BG13], we aim to recover 
a contraction rate of the posterior distribution on the functional objects 
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themselves (shape and mixture law of the deformations) , when the number 
of observations n is growing to +00. We will use in the sequel quite standard 
Bayesian non parametric methods already introduced in [BG13] to obtain 
the frequentist consistency and some contraction rates of the Bayesian pro- 
cedures. We will be interested in this paper on the consistency around the 
functional objects and of the posterior distribution where is the un- 
known shape to recover, and is the distribution of the nuisance parameter 
which deforms the shape. In this view, it will be necessary to consider smooth 
classes for both the shape / and the mixture g. This last point is quite dif- 
ferent from the situation studied in [BG13] where any mixture distributions 
(not necessarily smooth) were considered. Thus, we are naturally driven to 
consider prior on smooth densities: Dirichlet priors used in [BG13] will then 
become useless although Gaussian process priors considered in [vdVvZOSa] 
will be of first importance. Our approach will be adaptive on / but not on 
g: we will assume in the paper the smoothness parameter (denoted s) of / 
unknown but the smoothness parameter v oi g will be assumed known. 

The paper is organised as follows. Section 2 recalls a reduced description 
of the Shape Invariant Model, provides some notations for mixture models, 
describes our prior on (/, g) and gives our main results. Section 3 briefly de- 
scribes the behaviour of the posterior distribution for the new prior defined 
on {f,g) and main arguments relies on the previous work [BG13]. Section 
4 provides some general identifiability results and up to these identifiability 
conditions, shows the posterior contraction on the functional objects them- 
selves. At last, section 5 exploits the Fano Lemma and establishes a lower 
bound result of reconstruction in a frequentist paradigm. We end the paper 
with a short concluding section. 

2. Model, notations and main results. 

2.1. Statistical settings. 

Shape Invariant Model. We briefiy summarize the notations introduced in 
[BG13] for the random Shape Invariant Model (shortened as SIM in the 
sequel). We assume to be a "mean pattern" which belongs to a subset 
F of smooth functions. We also consider a probability measure g^ which 
generates random shifts denoted (Tj)j=i...n. We observe n realizations of 
noisy and randomly shifted complex valued curves Yi , . . . , K„ coming from 
the following white noise model 

(2.1) V2;G[0, 1] Vj = l...n dYj{x) := f{x - Tj)dx + adWj{x). 

Here, (Wj)j=i,,,n are independent complex standard Brownian motions on 
[0, 1], the noise level a is kept fixed in our study and is set to 1. 
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In the sequel, f~'^ is the function x ^ f{x — t). Complex valued curves 
are considered here for the simplicity of notations. We intensively use the 
notation "<" which refers to an inequality up to a multiplicative absolute 
constant. In the meantime, a ~ & stands for a/b — > 1. 

Functional setting and Fourier analysis. Without loss of generality, the 
function is assumed to be periodic with period 1 and to belong to a 
subset F of L^([0, 1]), endowed with the norm \\h\\ := \h{s)\'^ds. The 
complex Fourier coefficients of h are denoted 6i{h), I £ "Z. We will often 
use the parametrisation in through the Fourier expansion and will sim- 
ply use the notation {9i)££z instead of {6i{h))i^z- Since we aim to consider 
smooth elements /, we are interested by some Sobolev spaces and introduce 
the following useful set of functions (which is a subspace of a Sobolev space) : 

J-,:=|/g4([0,1]) I 0i(/)>O and + |£|2^)|e,(/)|2 < +oo I . 

A second useful set is defined as truncated elements of the former set: 

J^' := {f G Lli[0,l]) I ^i(/)>0 and y\k\ > 1 9^1) = O} . 

We will explain in section 4 why such a restriction is natural for the 
identifiability of the SIM. In the sequel, we denote the Sobolev norm 

In contrary to the picture described in [BG13], we consider only smooth 
densities, characterised by a regularity parameter v and a radius A: 

m,{[0,l]){A):=Lem{[0,l]) I ^k'''\ek{g)\^<AA, 
[ fcez J 

where S!Jt([0, 1]) is the set of probability on [0, 1] and ||.|| is the norm. At 
last, we will also need the set 

m{[o,i]y :={gem{[o,i]) \ ykez 0^(5) /o}. 

Bayesian framework. We consider functional objects {f^,g^) belonging to 
J^s (8) S[R^([0, 1])(^) and for any couple {f,g) G (g) ^^{[0, 1])(A), equation 
(2.1) describes the law of one continuous curve, whose law is denoted Fj^g. We 
denote V the set of probability measures over the sample space, described by 
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(2.1) when (/,(?) varies into J-"s(8'9Jt^([0, 1])(A). Given some prior distribution 
on V, we are interested in the asymptotic behaviour of the posterior 
distribution defined by 

' "^"/pn-=iP(^.)'in„(p)- 

Mixture model. According to equation (2.1), we can write in the Fourier 
domain that 



(2.2) yeeZ Vi G {1 . . . n} ee{Yj) = 0^e-'2^^'^^- + 



where 6^ := {9^)i^j, denotes the true unknown Fourier coefhcients of f^. 
The variables {Ce,j)i,j are independent standard (complex) Gaussian random 
variables: ^i.i.d. A/c:(0, 1), V£, j. For any p dimensional complex vector z, 7 

II II 2 

will refer to 7(2) := 7r~^e~"^" , the density of the standard complex Gaussian 
centered distribution A/'cp(0, Id), and 7^(.) := 7(. — /-i) is the density of the 
standard complex Gaussian with mean fi. 

For any frequence Ifi^iY) follows a mixture of complex Gaussian standard 
variables OiiY) ~ Jq l0Oe-'2^evdg{(p). Thus, in the sequel for any phase ip G 
[0, 1] we define the useful the notation 

Thus, the law of the infinite series of Fourier coefficients of Y is 
(2.3) e{Y)^ [\eo.J-)dg{ip). 



JO 

For any 9 G ^c(^) ^^^^ ^^^y 9 ^ ^^{[0, 1])(^), ^0,g wih refer to the law of 
the vector of ^^(Z) described by the location mixture of Gaussian variables. 
Following a notation shortcut, Pj ^ is the law on curves derived from ^e,g- 

2.2. Bayesian prior in the randomly shifted curves model. We detail here 
the Bayesian prior n„ on V used to obtain a suitable concentration rate. 
The two parameters / and g are picked independently at random following 
the next prior distributions. The shape / is sampled according to vr and the 
deformation law g is sampled according to qu,A, both defined below. The 
prior distribution vr will be adaptive w.r.t. the Sobolev space where is 
living. The prior distribution Qj^^a will be dependent on some knowledge on 
g^: its regularity and an upper bound of its norm. 
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Adaptive prior on f. The prior is mainly described in [BG13] and defined 
on J^s through 

Given any integer i, the idea is to decide to randomly switch on with proba- 
bility A(^) all the Fourier frequencies from — £ to +£. We denote (0,(T^) 
the law of the absolute value of a real centered Gaussian variable of variance 
o"^. Then, tt£ is defined by vr^ := {X'fcg27r| and 

V/c G Z TT^ = l|fc|>^(5o + U^i X l|fc|<^AAc(0,^2) ^ 1^^^ (Q^_e2)^ 

The law on the first Fourier coefficient is slightly different from the one used 
in [BG13] in order to belong to the identifiability class obtained through J^g 
(we have to impose the strict positivity of 9i, the first Fourier coefficient). 
The randomisation of the selected frequencies is done using A, a probability 
distribution on N* which satisfies, for some p S (1,2): 

3(ci,C2)GM+ V^gN* e-ci^'iog''^ < a(£) <e-'=2^''°s''^. 
In the sequel, we use a special case of the prior proposed in [BG13]: 
(2.4) ^^=n-V4(l,g,)-3/2_ 

Non-adaptive prior on g. The prior on 9}T^([0, 1])(^) will be the main dif- 
ference with the one given in [BG13]. We propose to use another prior in 
this work since we will need some smoothness result on g to push our result 
further than a simple contraction on laws. Such smoothness is not compati- 
ble with Dirichlet priors and even kernel convolution with Dirichlet process 
seems problematic in our situation. Thus, we have chosen to use some prior 
based on gaussian process. More precisely, we assume in all the paper that 
we know the smoothness parameter v of g^ , as well as the radius A of the 
Sobolev balls where g^ is living. 

Given > 1/2 and ^4 > 0, we define the integer ki, := \y — l/2j to be 
the largest integer smaller than v — 1/2. We follow the strategy of section 4 
in [vdVvZOSa] and the important point is that we have to take into account 
the 1-periodicity of the density g, as well as its regularity. In this view, we 
denote B a Brownian bridge between and 1. The Brownian bridge can be 
obtained from a Brownian motion trajectory W using Bt = Wt — tWi. Then, 
For any continuous function / on [0, 1], we define the linear map 



J{f):t^ f f{s)ds-t C f{u)du, 
Jo Jo 
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and all its composition are = Jk-i ° J- Moreover, in order to adapt our 
prior to the several derivatives of g at points and 1, we use the family of 
maps (ipj)j^i j.^ defined as 

Vt G [0, 1] tpkit) := sin(27rA:t) + cos(27rA;t). 

Our prior is now built as follows, we first sample a real Brownian bridge 
(i?T-)^g[o,i] ^iid Zi, . . . Z/^^^ independent real standard normal random vari- 
ables. This enables to generate the Gaussian process 

(2.5) Vr G [0, 1] Wr := JkAB){r) + 

1=1 

Given {wr,T G [0; 1]) generated by (2.5), we build through 

(2.6) VrG[0;l] P»(r) := -. 

Hence, a prior on Gaussian process yields a prior on densities on [0; 1] and 
inherits of the smoothness ki, of the Gaussian process t Wr- According to 
our construction, we now consider the restriction of the prior defined above 
to the Sobolev balls of radius 2 A. This finally defines a prior distribution 
q,,A on fOT,([0,l])(2A). 

2.3. Main results. Using the prior distribution n„ := vr (8) qu,A, we will 
first establish the following result on the randomly SIM. 

Theorem 2.1. Assume that f G Us with s > 1 and g'^ e Tly{[0, l])iA), 
then there exists a sufficiently large M such that 

n„{P/,g s.t. dH{rf^g,Ff0^go) < Men\Yi,...Yn} = 1 + Op^o^^o(l) 

when n — > +00, where for an explicit k > 0: 

= n~[2^^2^^|] log(n)'^. 

We derive also in this paper a second results on the objects / G Hs and 
g G 5[R,y([0, 1])(j4) themselves. The first one concerns the identifiability of the 
model and is stated below. 

Theorem 2.2. The Shape Invariant Model is identifiable as soon as 
(/'^jfl'^) G X 9Jt([0, 1])*.' the canonical application 

I ■■ (/°,5°) G -T^"* X Tl{[0, l]y I — > Fp^go is injective. 
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According to this identifiability result, we can derive a somewhat quite 
weak result on the posterior convergence towards the true objects and g^. 

Theorem 2.3. The two following results hold, i) Assume that £ Fg 
with s > 1 and g^ G 9Jljy([0, 1])(^) with v > 1, then there exists a sufficiently 
large M such that 

Iln{g s.t.\\g-g^\\ <M/i„|yi,...y„} = 1 + Op^o^o(l) 

with the contraction rate fin = (logn)"*^ . 

a) In the meantime, assume that g^ G 9JIi/([0, 1])(A) satisfies the inverse 
problem assumption: 

3(c) >0 3^>i/ + i ykeZ |0fe(£/°)| > cA;-^ 

then we also have 

n„ {/ s.t. 11/ - < M/i„|yi, . . . y„} = 1 + Op^„ ^„ (i) 

when n — > +oo. Moreover, the contraction rate /i„ is given by 

fin = (logn)"^^W+^ . 

We do not know if such kind of result may be asymptotically optimal since 
a frequentist minimax rate does not seem identified for the randomly shifted 
curve model when both / and g are unknown. In Section 5, we will stress 
the fact that it is indeed impossible to obtain frequentist convergence rates 
better than some power of log n, even if our lower bound does not exactly 
match with the upper bound obtained in the previous result. 

Theorem 2.4. Assume that £ Ts X mu{[0,l]){A), then there 

exists a sufficiently small c such that the minimax rate of estimation over 
Fs X ^y{[Q,l]){A) satisfies 

liminf (logn)^*^^ inf sup ||/ — /||^>c, 

"^+~ /eJ-. (/,g)eJ-.xan,([o,i])(A) 

and 

liminf (log n)^''''"^ inf sup 11^ ~ ^iP ^ c. 

This result is far from being contradictory with the polynomial rate obtained 
in Theorem 2.1. One can make at least three remarks: 
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• The first result provides a contraction rate on the probability distribu- 
tion in V and not on the functional space Ts- 

• The link between {f^,g^) and P/o^^o relies on the identifiability of 
the model, and the lower bound is derived from a net of functions 
{fi,gi)i, which are really hard to identify according to the application 
^ '■ if 7 9) '"^ t^i^ °f functions, the injection is very "flat" 
and the two by two differences of I(f^,g^) are as small as possible and 
thus the pairs of functions {P,g^) become very hard to distinguish. 

• In fact, [BGIO] have shown that in the SIM, when n — > +00, it is im- 
possible to recover the unknown true shifts. The abrupt degradation 
between the polynomial rates on probability laws in V and the loga- 
rithmic rates on functional objects in x ^^^^([0, 1])(^) also occurs 
owing to such a reason. One may argue that such an artefact could be 
avoided if one chooses a different distance on which may be better 
suited to our framework, such as 

dprechetifl, f2)-= inf /f ~ /a • 

r6[0,l] " " 

We do not have purchased further our investigations with this distance 
on J^5 but it would certainly be a nice progress to obtain the posterior 
contraction using such a distance. We expect a polynomial rate, but it 
is clearly an open (and probably hard) task. 

3. Contraction Rate of the posterior distribution. We provide in 
this section a short proof of Theorem 2.1 since it is almost an extension of 
the result obtained in [BG13] for a more general class of mixture models. 
We first recall a useful result established in [BG13] which links the total 
variation distance between Fj^g and Pj^ and the norm of / — / in L^. 

Lemma 3.1. Let f and f be any functions in L^([0, 1]), g be any shift 
distribution in SUt([0, 1]), then 

II f- fll 

The next proposition is concerned by the closeness of two laws ¥f^g and 
Pj^g, when we keep the same shape / G Tii- Consider the inverse functions 
of the distribution functions defined by 

yu G [0,1], G~\u) = mf{t G [0,1] : g{[0,t]) > u}. 
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The Wasserstein (or Kantorovich) distance Wi is given by 



C G-\t)-G-\u) 
Jo 



dt. 



Proposition 3.1. Consider f G T-Li, and let g and g be any measures 
on [0, 1] . Then 

dTv{Pf,g,P.f,g) < V2TT\\f\\n,Wi(g,g) 

< V27T\\f\\n,dTvi9,9) < Af\\n,\\9 - ~g\\/V2. 

The last two upper bounds are useful in our setting because we only 
consider distributions that admit regular densities. 

Proof of Proposition 3.1. We use a change of variable, the convexity 
of dj-Vj and Lemma 3.1 to get 



dTviPf,g,^f,9) 



"LS^dgia) 



fA 



G-l(«) 



^f,5.d~g{a) 
du 



< 





1 

72 Jo 



'1 



TV 

TV 
du 



f 



du. 



Then 



-G-i(«) 



-G-i(«) 



< 2ti\g~^{u) - 
Therefore we get the first inequality: 



A2-KkG-^{u) _ p-i27rfcG-i(u) 



G-\u)\ Y,k^\c,{f)\\ 



dTv{^f,g.^fra) < ^AfWu, / G~\u) - G~'{u) 



du. 



Now, the second inequality is a classical result: see for instance [GS02, 
Theorem 4]. The last inequality is well known too. □ 



Proof of Theorem 2.1. We mimic the proof of Theorem 2.2 of [BG13]. 
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Complementary of the sieve. First, we consider the sieve over V defined as 
the set of ah possible laws when / has truncated Fourier coefficients and a 
restricted norm: 

■■= {^f,9 ■■ if,9) G X Tl,{[0,l]){2A), 11/11 < Wn} , 

where kn is a sequence such that i — > +00 as n 1 — t- +00, and w'^ = 4:kn + 2. 

Since our sieve is included in the set of all mixture laws, we can apply 
Proposition 3.10 of [BG13] and get 

Entropy estimates. Since S!Jt,y([0, 1])(2j4) C 5Jt([0, 1]), our sieve is included 
in the sieve considered in [BG13], we also deduce that for any sequence 
en ^ 0: 

log D{en, Vk^ ,w,,,dH) < kl log k^ + log — 

Lower bound of the prior of KuUback neigbourhoods. We use the description 
of Kullback neigbourhoods based on our preliminary results . We define 

in = ceji (log — ) , an integer such that In^^^ ^ ^ Cn^^**, and the sets 



and 

g,^ := {g G 1]){2A) : dTv{g,g^) < • 

We deduce from Lemma 3.1, Proposition 3.1 and arguments of Proposition 
3.9 of [BG13] that as soon as / S J-i„ and g G Qi^^ '^f,g belongs to an 
e„ Kullback neighbourhood of Pjo ^o. From Proposition 3.9, we can use the 
following lower bound of the prior mass on T^^: 

According to Theorem A.l given in the appendix, we know that 

since ky + 1/2 < v (see also [LSOl] for a very complete survey on the small 
ball probability estimation for Gaussian processes). 
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Contraction Rate. We now find a suitable choice of kn and in order to 
satisfy Theorem 2.1 of [GGvdVOO], i.e. 

logD{en,'Pkn,w„,dH) <nel 

Fohowing the arguments already developed in Theorem 2.2 of [BG13], we 
can find 7 > and k > such that 

:= n"[2^^2^^^] log(n)'', kn = n^'ta^^a^^tl log(7i)'^. 

□ 

4. Identifiability and semiparametric results. In the Shape Invari- 
ant Model, an important issue is the identifiability of the model with respect 
to the unknown curve / and the unknown mixture law g. We first discuss 
on a quite generic identifiability condition for IP/,^. Then, we deduce from 
Theorem 2.1 a contraction rate of the posterior distribution around the true 
f and 5°. 

4.1. Identifiability of the model. In previous works on SIM, the identi- 
fiability of the model is generally given according to a restriction on the 
support of g. For instance, [BGIO] assume the support of g to be an interval 
included in [—1/4,1/4] (their shapes are defined on [—1/2; 1/2] instead of 
[0, 1] in our paper) and g is assumed to have mean although / is supposed 
to have a non vanishing first Fourier coefficient (^i(/) ^ 0). The same kind 
of condition on the support of g is also assumed in [BG12]. 

If the condition on the first harmonic on / is imperative to obtain identi- 
fiability of g, the restriction on its support size seems artificial and we detail 
in the sequel how one can avoid such a hypothesis. First, we recall that for 
any curve Y sampled from the SIM, the first Fourier coefficient is given by 
6li(V) = 6l?e-^2''^ + C (here (9? = 6'i(/°)). Hence, up to a simple change of 
variable in r, we can always modify g in g such that G M+. It is for in- 
stance sufficient to fix g{(p) = g{(p + a) where a is the complex argument 
of 6^. Hence, to impose such an identifiability condition, we have chosen to 
restrict f to Tg. This condition is not restrictive up to a change of measure 
for the random variable r. We now establish the proof of Theorem 2.2. 

Proof of Theorem 2.2. The demonstration is decomposed using three 
hierarchical steps. First, we prove that if IP/j^ = IP? - , then one has necessarily 



12 



D. BONTEMPS AND S. GAD AT 



Oi{f) = Oi{f). Then we deduce from this point that g = g and at last we 
obtain the identifiabihty for ah other Fourier coefficients of /. 

Note that as soon as > 1/2, g and g admit densities w.r.t. the Lebesgue 
measure on [0, 1]. In the sequel we use the same notation g to refer to the 
density of g. 

Point 1: Identifiability on 9o{f) and (9i(/). We denote Pj^^ the marginal 
law of Ff^g on the fc*^ Fourier coefficient when the curve follows the Shape 
Invariant Model (2.2). Of course, we have the following implications 



We immediately obtain that 0o(/) = ^o(/) since 6*0(7) (resp. 6o{f)) rep- 
resents the mean of the distribution P^ (resp. P''- ). But note that the 

■''f f,9 

distribution Pj^ does not bring any information on the measure g, and is 
not helpful for its identifiability. Concerning now the first Fourier coefficient, 
we use the notation 9i := 9i{f), 6i '■= 0i{f) and remark that 



drv 



1 

~ 2^ 



dz. 



Assume now that 9i ^ 9i, without loss of generality 9i > 9i > and consider 
the disk Dq (o, , we then get Vz G Dq (o, , Va G [0, 1] : 



6*16 - z\ < and - z\ > — -. 

I i '2 ' '2 

Hence, for ah z e Dc (o, we get e-^''^^'^^"-'^^ g{a)da > e"^^^ 

and of course e~^^^'^'^''"~^^^ g{a)da < e~'" 4 ' . We can thus write the 
following lower bound of the Total Variation 



:(0,^) 



1 

e 



[\-\'-'"^"-^\'gia)da 
Jo 



dz > 0. 



In the opposite, ^Tv(IP} IPj -) = implies that 0i = 9i since / and / belong 
toJ-,(^). 
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Point 2: Identifiability on g. We still assume that drvij^lg,^}-) = 0. We 

know that 9i = 9i and we want to infer that g = g. We are going to establish 
this result using only the first harmonic of the curves. Using a polar change 
of variables z = pe"^, we can write that 



dxv 



1)1 pi 

1 

2^ 
1 

4^ 



c 



,25Re(2eie'2Ta) , 



e {g{ct) — g{a)da 



dz 



+ 00 



pe 







+ 00 



2n 



2n 



2lT 



2lT 



^2p9^ ^os{u-^)^g_~-^^^^2TT)du 



g2p6»i cos(m) 



id -9) 



U + If 

27r 



da 



difdp 
difdp 



1 

4^ 



+ 00 



pe 







2n 







\i^2pei{^)\ difdp. 



In the expression above, we denote h = g — g and ipa{f) is defined as 



V'a(¥') 







2tt 



du. 



Of course, "i/^Q is upper bounded by 47re'*, and a very rough inequality yields 
> Hence, 

1 /■+00 

(4.1) dTHFL,F}-p > -i^ / pe-(^?+''^+2^^^)||V.2p.,f dp. 





Using the fact that v > 1, h may be expanded in Fourier series since h G 
£^([0,1]): 

and we can also obtain the Fourier decomposition of ipa'- 



Thus, the norm of ipa is given by 
(4.2) U,f = Y,\cn{h)\' 



2-Knip 



2-K 2 
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Now, if we denote the first and second kind of Tchebychev polynomials 
iTn)n& and {Un)n& wliicli satisfy T„(cos 9) = cos{nO) and (sin 9)Un{cos 9) = 
sin(?i0), we can decompose 



27r 

f.27r 



/■Mr 

/ [Tn (cos u) + i(sin n) t/„ (cos u)\du 

Jo 



A;! 

fe>0 



T„(cos u) + i(sin u) f3j (cos 
j=o 



where we have used the analytic expression of Un given by 

E({n-l)/2) 

UnicOSu)= {-iyCl^^\cOSu)''-^^-\l-COs'^Uy. 
j=0 

Hence, we obtain 



f \acosiu)^inu^^^ ^ (cOS 

Jo Jo fc>o k\ 

+ i^^/3j— / sin n (cos n)'^"'"'' (in 
fc>0 j=o ■ 

= / 2^ T^j Tn[COSU)du 

•^0 fc>0 

= / e"'=°'(")cos(nu)(iu G M if a G M. 

JO 



We denote An the following (holomorphic) function of the variable a as 
An{a) := / e''"°'^(")cos(nu)d'u, 

JO 

and equation (4.2) yields 

(4.3) 11^^,112 = ^|c„,(/i)|2^„(a)2. 

Moreover, for each n, ^„ is not the null function, otherwise it would be the 
case for each of its derivative but remark that (cos ti)" may be decomposed 
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in the basis (T^) and using successive derivations 



15 



4")(o) 



2-K 



— / (cos u)^ cos(nn) du 

.k=0 ' •'^ 



(0) 



(cos n)"T„(cos u)du 







2-K 



n— 1 



afcTfc (cos u) + 2^ "Tn (cos n) 



,fc=o 
2^-"7r > 0. 



T„(cos u)du 



Note that in the meantime, we also obtain that j4n^(0) = 0,Vj < n, so that 



(4.4) 



^n(a) ~ai-^0 



2l-n 



We can conclude the proof of the identifiability of g using (4.3) in (4.1) to 
obtain 

-| / j' + OO 



:=/n(9i) 

From (4.4), we can deduce that each integral /n(^i) 7^ 0, V?i S "L and we then 
conclude that: 



9 = 9 



et 



71 = (71. 



Point 3: Identifiability on f. We end the argument and prove that ¥f^g = 
Pj - implies f = f- We already know that g = g and it remains to establish 
the equality for all the Fourier coefficients whose frequency is different from 
and 1. By a similar argument as the one used for the identifiability of 9i 
(Point 1), we can easily show that 

dTy(P^,3,P)^-)=O^|0fc| = |^~fc|. 

But we cannot directly conclude here since it is not reasonable to restrict the 
phase of each others coefficients Okif) to a special value (as it is the case for 
6i{f) which is positive). We assume that Ok = Oke^^ ■ Since g = g, we have 



1 

2^ 



2lT 



(ip — fca) |2 



g{a)da 



dz. 
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Now, if one considers z = x + iy, F is differentiable with respect to x and 
y and F{0) = 0. A simple computation of VF(0) shows that VF(0) is the 
vector (written in the complex form) 

VF{0) = eke-^''^^"ck{g)[l-e'^]. 

Since g £ 9?t([0, 1])*, this last term is non vanishing except if = (which 
trivially implies that 0^. = = 9^) or if 93 = 0(27r). In both cases, -F'(O) = 
<^=^ 9k = 9k- Thus, as soon as 9k 7^ 9k, we have V-F(O) 7^ and we 
may find a neighbourhood of denoted B{0,r) such that |-F|(-z) > when 
z S B{0,r) \ {0} . This is sufficient to end the proof of identifiability. □ 

In a sense, the main difficulty of the proof above is the implication of 
(iTv(P/- „, P i -) =^ 9 = 9- Then, the identifiability follows using a chaining 
argument 9i{f) g ^ 9k{f),yk ^ {0,1}. We will see that this part of 
the proof can also be used to obtain a contraction rate for / and g around 
and g^. We recall here the main inequality used above: V^i > and 
"^(g^g) £ ^!^([0, 1])(A), the identifiability on g is traduced by 

(4.5) 

drv K,gyeJ > ^Y.J'n{g - ~g)\' {^j\e-^p^'-'^' An{2p9^fdi^ 

The aim of the next paragraph is to exploit this inequality to produce a 
contraction rate of g aroung g^ . 

4.2. Contraction rate of the posterior distribution around and g^. 

4.2.1. Link with deconvolution with unknown variance operator. We pro- 
vide in this section an upper bound on the contraction rate of the posterior 
law around and g^. This question is somewhat natural owing to the iden- 
tifiability result obtained in the previous section. We thus assume for the 
rest of the paper that / G J> and g £ dJlu{[0,l]){A) for some parameters 
s > 1 and V > 1. 

Remark first that our problem written in the Fourier domain seems strongly 
related to the standard deconvolution with unknown variance setting. For in- 
stance, the first observable Fourier coefficients are 

9i{Yj) = 0ie-'2--. + ei,,-, Vj G {1 . . . n} 

and up to a division by ^1, it can also be parametrised as 

(4.6) h{Yj) = + Vj G {1 . . . n}, 

PI 
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which is very similar to the problem Y = X -\- e studied for instance by 
[Mat02] where e follows a Gaussian law whose variance (here l/9f) is un- 
known. As pointed in [Mat02] (see also the more recent work [BM05] where 
similar situations are extensively detailed), such a particular setting is rather 
unfavourable for statistical estimation since convergence rates are generally 
of logarithmic order. Such a phenomenon also occurs in our setting, except 
for the first Fourier coefficient of / as pointed in the next proposition. 

The roadmap of this paragraph is similar to the proof of Theorem 2.2. 
We first provide a simple lower bound of cItv which enables to conclude for 
the first Fourier coefficient. Then, we still use the first marginal to compute 
a contraction rate for the posterior distribution on g around g^. At last, 
we chain all these results to provide a contraction rate for the posterior 
distribution on / around f^. 

4.2.2. Contraction rate on the first Fourier coefficient. 

Proposition 4.1. Assume that {f,g) G x Tl^{[0,l]){A), then the 
posterior distribution satisfies 



,Yn]^0 



in IP/o^gO probability as n ^ +oo for a sufficiently large M. The 
contraction rate around the true Fourier coefficient is thus at least 

„-l/3x[i^/(2i.+l)As/(2s+2)A3/8] (logn)!/^. 

Proof. The demonstration is quite simple. Remark that using the be- 
ginning of the proof of Theorem 2.2, one can show that for any 0i such that 
< 7] < \ei - e1\ < e1/2, one can bound, for any g G mu{[0,l]){A), the 
Total Variation distance between Pj^ and Pjo^^o. Remark that 



drV (P/,g,P/0,30) >dTV 



owing to the restriction of P/^g to the first Fourier marginal and the varia- 
tional definition of the Total Variation distance. Then 



dry (P},,,P}o,,o 



1 

> — 
- 27r 



Kaje-I^-'^^^'""!' - g'{a)e~\'-'''^'"^'\"d^ 



dz 



- 32 



-(3e?+ei)Vi6 _ ^-(30i+0?)"/i6 



> C(0?)r?3, 
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for a suitable small enoug h constant C(6'5'). Now, one can use simple inclu- 
sions and Pinsker inequality 

{01 G 5(0,??)^} C {0i|dTv(IP/,g, IP/0,90) > C{e\)7f] 

C{0l|dH(P/,„P/o,,o)>C(0?)773}. 

The proof is now achieved according to Theorem 2.1. □ 

4.2.3. Posterior contraction rate around . We now study the contrac- 
tion rate of the posterior distribution around the true mixture law g^. This 
result is stated below. 

Theorem 4.1. Assume {f,g'^) G x S[»t,.([0, 1]){A), then 

nn(ff : >Miog"''^(n)|yi,...,y„) ^0 

in f\gO probability as n ^ +00 for a sufficiently large M . 

Proof. We first restrict ourselves to the first marginal on Fourier coeffi- 
cient as before. Using Theorem 2.2, we know that 

n„{P/,g s.t.dH{^f,gJp^gi^)>Men\Yi,...Yn]^Q as n ^ +00. 

Since 

(irv(P0i,g,IP^o,<^o) = (iTy(P},g,P}o,go) < (iTv(P/,5,P/o,go) < (i/^(P/,3,F/o,3o), 

we then get 
(4.7) 

Tln[^f,g s.t.dT\/(F^,,3,F^Ogo) >Me„|yi,...y„} ^0 as n ^ +00. 

For any g G 5Jt,y([0, 1])(j4), the triangular inequality yields 

(4.8) dTV [Ku^ n.,g) + dTV (P^,„ P^o,,a) > dry (P^.. ^, P^.. ^0) . 

Now, let / be defined by 6'i(/) = 6'i(/), and for any k G Z\{1}, OkU) = 
^k{f^)- Then Lemma 3.1 yields 



dTV (P^o,,,P^„J=dTy (P},,,P}o, 



< 



1^1 



/2 ^/2 ■ 
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Therefore 
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(4.9) n„ (^P/,, s.t. dTv [Ko,,, < 



M 



1/3 



Yl,...,Yn 



>Un(Ff^yS.t.\ei-e^,\ <Mey3 



Yl,...,Yn 



1 



as 71 — )• +00. In conclusion, we deduce from (4. 7), (4. 8) and (4.9) that for M 
large enough: 



Yi,. . . ,Yn 



1 as n — )• +00. 



We then use equation (4.5) applied with 6i = 6^ and the last equation to 
obtain our rate of consistency. Remark that 
(4.10) 

-1 f + OO 



where we have used the definition 

f-27r 



^acos{u) cos{nu)du. 







Now, we use equivalents given by Lemma B.l detailed in the Appendix. We 
only keep the integral of An for a G [0, Cy/ri\ since it can be shown that the 
tail of such integral will yield neglictible term We just use the equivalent 
given by (B.l). 

One can find a sufficiently small constant n such that 

f + OO 

pe-^'>^'°^'An{2pe',)'dp 



> 



Jo ri9 \ n J 



dp 



> 1 



,2 , V" 

29'' 



Now, we can apply the Stirling formula to obtain: 



n! 



12 



.2 r/)0T2n 



4vr^{el'}^«_^-(e?+^) _(V^/(20?)) 



0^\2n+2 



(n/e)^"27rn 
27r 

n(2n + 2)' 



2n + 2 



-2n log 



4{e'jJ} 
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Hence, this last term is lower bounded by C{6^)e~"'^"^^'^\ As a consequence, 
we can plug such lower bound in (4.10) to get 



dTV (Pgo ^g,Flo^go)>c^\ck{9-g)\^e- 



ck log k 



for c sufficiently small. We now end the proof of the Theorem: choose a 
frequency cut-off kn that depends on n and remark that 

V<7Gmt,([0,l])(^) \\g-gY= E M9-9')\'+ E 

\i\<kn W>kn 

W<k„ 

We know from Equation (4.9) that the last bound is smaller than e'^'^'" ^°^^"el/'^ + 
k~'^'^ up to a multiplicative constant, with probability close to 1 as n goes 
to +00. The optimal choice for kn yields 

[kn + 2v] log A;„ = ^ log — . 

This thus ensures that 

Hn {g s.t. \\g - g^f < Mlog(n)-2'^|yi, . . . y„} ^ 1 as n ^ +oo. 

□ 

In the last proof, we have used the knowledge of v as well as the radius 
A of the Sobolev space 9Jt,y([0, 1])(A) in the last lines to build a suitable 
threshold kn- Without this assumption, we cannot control easily from the 
behaviour of the posterior distribution around P/o^gO the posterior weights 
on 9Jtj/([0, 1])(A): that's why it is difficult to conclude with an adaptive prior. 

4.2.4. Posterior contraction rate around f^. We then aim to obtain a 
similar result for the posterior weight on neighbourhoods of f^. Even if our 
results are quite good for the first coefficient 6i, we will see that indeed, this 
is far from being the case for the rest of its Fourier expansion. 

Theorem 4.2. Assume (/°, 5°) eTsX m^{[0, 1]){A) and 

3(c) >0 3/3>z^ + i ykeZ \ek{g°)\>ck-'^, 
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then 

n„ (/ : 11/ - fY > M (logn)-2^^raTT \Yu...,Yn 
in ^f\gO probability as n — )• +00, for a sufficiently large M . 



Proof. The idea of the proof is very similar to the former used argu- 
ments, we aim to study the posterior weight on neighbourhoods of the true 
Fourier coefficients of f^, whose frequency is larger than 1. 

Point 1: Triangular inequality. For any f & J-'s, we have for any /c G Z: 
dTy(Pj,,o,Fjo,,o) < dTv{^'},gO,F}^g)+dTv(f'}o,gO,F}^g). 

The second term does not exceed e,„ <^ log(n)~'' with a probability tending 
to 1, more precisely 



(4.11) n„ (^k e z (iTv(P/,3, Pjo,go) < 

as n — > +00. 



1 



1. To ob- 



Pomt 2: n„ sup dTv{^f,gO,^lg) < Mlog(n)-'^ ^i, • • • ,1^ 
tain such a limit, we can use first the Cauchy-Schwarz inequaUty as fohows: 



2tv 



dz 



< 



\9~9 
2-K 



,0| 



2-K 



.1/2 



dip 



Now, the Young inequality implies that for any M > 0, 



and the choice M = 2 yields 



2K zOke 



ikip 



(4.12) drvi^ 



< 



\a- 9 

2-K 



0| 



> \z\ 



1 



dz 



|0fe|2(M-l) 



1/2 



dz < 



To obtain that the former term is bounded, we first establish that in- 
deed the posterior distribution asymptotically only weights functions / with 
bounded Fourier coefficients. We hence denote 

An = {{f,g) -.BkeZ dTv{^'},go,r'}J > M login)-"} 



22 



D. BONTEMPS AND S. GAD AT 



and the two sets 



and 



B = {f:ykeZ \9k\<\el\+Mloginr''} 



C = {f:ykGZ \0l\<\ek\+M login)-''}. 



We first consider an integer k and 9k such that \9k\ > l^^l + Mlog(n) 
then 



1 




f27r , 



For any z in the centered complex ball Bn = B [O, ^'^ — ] ^ one has for 
any ip £ [0, 27r] 



Hence if |6'fc| > |6I^| + Mlog(n)-^, one has 



dTy(P5o,,o,P^,,) 
1 



> 



27r 



27r 



dz 



> — I e 



-[|eO|+Mlog(n)-'73]2 _ g-[|e»|+2Mlog(n)-''/3]2^^ 



>c|^^|Vl^°l'log(n) 



-3u 



for a sufficiently small absolute constant c > 0. Since the sequence is 



bounded, for n large enough, we know that 
We can deduce from (4.11) that 



''>^f^l^fel'log(?i)-3'^>en. 



(4.13) Un (B^lYi, . . . ,Yn) ^ as n ^ +oo. 

A similar argument yields 

n„ (C^lYi, . . . , F„,) — ^ as n — ^ +oo. 
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Gathering now (4.13) and (4.12), we get for a sufficiently large M 

Un {An\Yi,. ..,Yn) = Un {An H 13nC\Yi, . . . ,Yn) 

+ n„ {Ann{Bncr\Yi,...,Yn) 

< n„ (\\g - > Me"(^+"^P* log(n)^' 

+ n„ {B'\Yi, . . . , y„) + n„ (c^|yi, . . . , y„) 

We can now apply Theorem 4.1 to obtain the desired result: 
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(4.14) Un ( snpdTv{P'}aO,^'fa) < Mlog(n) 



Yi,...,Y, 



as n 



+00. 



Point 3: Contraction of 6}^ near 9^. From the arguments of Point 2, we see 
that 

n„(/:VA;GZ ||6lfe| - |6I^|| < Mlog(n)-'^) — ^1 as n — ^ +oo. 

We now study the situation when \\6k\ — |^fc|| < Mlog(n)~'', and we can 
write 0k = Ole''^ + ^„ where ^ ^ is a complex number such that \^n\ ^ 
Mlog(n)-''. 



1 

2^ 



2n 



|^„gOgifeQ|2 



g^{a)da 



dz 



--F{z) 



Indeed, F{0) ~ since a Taylor expansion near yields at first order in z 
and ^„ that 



F(z) = 2e-I^^l' 



27r 



1 + Ke z9ke 



-ika 



fl + Ke ( zO'le-''"'' 



g^{a)da 



+ o{\z\) + 0(\U)- 



If one uses now = ^^e''^ + C'(log(n) '^), the computation of the integral 
above yields for c < 2 and rj small enough such that \z\ < r]: 

|F(^)| > ce-l^"l' sm{yD/2)^e(zxe'^/^elc-k{g'^)) + O (login)-") 



Now, denote u = -r^. — — , \J which is a complex number of norm 1, and 
let V = ue^'^l'^ . The vector v is orthogonal to u and z may be decomposed as 

z = au + hv. 
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We then choose |6| < \a\/2 and denotes TZa the area where z is hving. For 
a < r] smah enough, we obtain that there exists an absolute constant c 
independent of k such that 

dTy(P5,,o,Fjo,,o)> / |F(z)|>cr?3e-l^°l^|sin((^/2)||0-O||c_,(5°)| 
+ O (log(n)-'^) . 
Since \9k - el\ = 2| sm{ip/2)\\el\ + O (log(n)-''), we get that : 

(4.15) dTy(Pf,,o,Pfo,,o) > cr?3e-I^S;-l'|c_fc(5°)||0fc - 6^1 + O (log(n)-'^) . 

Thus, we can conclude using (4.14) and (4.15) that there exists a sufficiently 
large M such that 

(4.16) n„f/:sup|(^fc-0^)c_fc(5°)| <Mlog(n)"^ Yi,...,Yn 
as n — > +00. 

Point 4- Contraction on f^. We can now produce a very similar proof to 
the one used at the end of Theorem 4.1: 

\\f-fr= E E 

\i\>k„ \e\<k„ 

9i - e^,\'\c.,ig'r 



|c-£(5°)P 



0m2 



W<k„ 

\e\<kn 

<k-J^ + kT' sup \e,-e'A\-^{g')? 



Hence, (4.16) implies 

Iinif:\\f-ff<K'' + kf^'^og{nr^^ 



1 as n — 7- +00. 



The optimal choice of the frequency cut-off is kn = (log n) 2^+2s+i ^ which 
yields 



n„f/:||/-/°f <M(logn)-^-/(2^+2/^+i) 



Yi,...,Yn 



1 



as n — 7- +00. This last result is the desired inequality. 



□ 
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Remark 4.1. The lower bound obtained on dxvi^'f gO^^^o go) ""^^^^ ^6 ^'^^ 
portant to understand how one should build an appropriate net of functions 
ifj^Sj) G -^s X ^J^i/lfO, 1])(^) hard to distinguish according to the distance. 
When l^fcl 7^ 1, it is quite easy to distinguish the two hypotheses but it is 
far from being the case when their modulus is equal. In such a case, the be- 
haviour of the Fourier coefficients of becomes important. This is a clue to 
exhibit an efficient lower bound through the Fano lemma (for instance). This 
is detailed in the next paragraph. 

5. Lower bound from a frequentist point of view. 

5.1. Link with the convolution with unknown variance situation. We com- 
plete now our study of the Shape Invariant Model by a small investigation 
on how one could obtain some lower bounds in the frequentist paradigm. We 
could consider several methods. Among them, the first one could be the use 
of results in the literature, such as the works of [Mat02] or [BM05]. Indeed, 
in the convolution model with unknown variance 

(5.1) Yi = Xi + ei,yie{l...n} {Xi)i=i,„n ^ g, 

we already know that one cannot beat some log n power for the convergence 
rate of any estimator of both g and of the variance of the noise o"^. Such 
a nice result is obtained using the so-called van Trees inequality which is 
a Bayesian Cramer- Rao bound (see for instance [GL95] for further details). 
However their result cannot be used here: Proposition 4.1 p. 17 is much 
more optimistic since we obtain there a polynomial rate for the posterior 
contraction around 6^. 

First, note the results given by [Mat02] and Proposition 4.1 are not op- 
posite. Indeed, [Mat02] considers lower bounds in a larger class than the 
estimation problem of 9i written as (4.6): from a minimax point of view, the 
supremum over all hypotheses is taken in a somewhat larger set than ours. 
Moreover, if one considers (4.6), the density of e~^^^'^^ is supported by 
instead of the whole complex plane which would be a natural extension of 
(5.1). Hence, g is a, singular measure with respect to the noise measure: the 
ability of going beyond the logarithmic convergence rates is certainly due to 
the degeneracy nature of our problem according to the Gaussian complex 
noise. It is an important structural information which is not available when 
one considers general problems such as (5.1). 

5.2. Lower bound. Following such considerations, we are thus driven to 
build some nets of hypotheses hard to distinguish between and then apply 
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some classical tools for lower bound results. We have chosen to use the Fano 
Lemma (see [IH81] for instance) instead of Le Cam's method, since we will 
only be able to find some discrete (instead of convex) set of pairs {fj,gj) in 
J-g X 9Kjy([0, 1])(^) closed according to the Total Variation distance. We first 
recall the version of the Fano Lemma we used. 



Lemma 5.1 (Fano's Lemma). Let r > 2 be an integer and Air C V which 
contains r probability distributions indexed by j = 1 . . . r such that 



yj^f d{e{Pj),9{p,,)>ar, 



and 



dKL{Pj,Pj')<(3r. 

Then, for any estimator 9, the following lower bound holds 



maxEj d{9,9{Pj)) 
3 L 



/jr +log2 

log r 



We derive now our lower bounds result. 



Theorem 5.1. There exists a sufficiently small c such that the minimax 
rates of estimation over x 9Kj^([0, 1])(^) satisfy 

liminf (logn)^'^^^ inf sup ||/ — /|P>c, 

f<^^s (/,g)eJ-sxSr)t,([0,l])(yl) 

and 

liminf (log n)^''^^ inf sup ||5 — S'lP > c. 

"■^+°° 96-^» (/,g)eJ-sxm,([o,i])(yi) 

Proof. We will adapt the Fano Lemma to our setting and we are looking 
for a set [fj^gj)j^i p^ such that each ^f^^g^ are closed together with rather 
different functional parameters fj or gj. Reading carefully the Bayesian con- 
traction rate is informative to build pn hypotheses which are difficult to 
distinguish. First, we know that since each fj should belong to J-g, we must 
impose for any fj that 9\{fj) > 0. From Proposition 4.1, we know that one 
can easily distinguish two laws P/^.^j and ¥f,,g., as soon as 9i{fj) ^ 9i{fji). 
Then we build our net using a common choice for the first Fourier coefficient 
of each fj in our net. For instance, we impose that 



ViG{l...p„} 9^{fj) = l. 
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Point 1: Net of functions [fj)j^ip^. We choose the foUowing construction 
(5.2) Vj ^{l...pn} fj{x) = +p-^e^2li^'^e^2.p„x_ 

The number of elements in the net p„ wih be adjusted in the sequel and will 
grow to +00. Note that our construction naturally satisfies that each net 
{fj)j=i...p„ belongs to J^g since the modulus of the Pn-th Fourier coefficient is 
of size p~^. At last, we have the following rather trivial inequality: £ 



II/, - ff r > p-'' X e'2-/P" - 1 > Ap--'' sin^(7r/p„) ~,^+oo 47rV 

Point 2: Net of measures [gj)j^i p^. The core of the lower bound is how 
to adjust the measures of the random shifts to make the distributions Ff.^g., 
j = 1 . . .pn, as close as possible. First, remark that the Fano Lemma 5.1 is 
formulated with entropy between laws although it is quite difficult to handle 
when dealing with mixtures. In the sequel, we will choose to still use the 
Total Variation distance, and then use the chain of inequalities: Vj 7^ j' 



2 



2s„- 2/ /„ \ ^_2 -2s-2 



Hence, from the tensorisation of the entropy, we must find a net such that 
drv fj,gji^ fji,gji^ ^ Vn with — -yA?nlog?7„ = 0{l/n) to obtain a tractable 
application of the Fano Lemma (in which Pj = Pf" ). It imposes to fold 

some mixture laws such that drv f n^,^ f , n ,) ^ — v7- From the tri- 

y Jj^gj^ Jj''9jiJ (nlogn)^ 

angular inequality, it is sufficient to build (^gj)j^i satisfying 

(5.3) Vj e{l...pn} dTV (lP/,,.„P/„.0 < 

For sake of convenience, we will omit the dependence of p„ on n and simplify 
the notation to p. In a similar way, 9p will denote the p-th Fourier coefficient 
of fj given by Op = e^^'^'^^Op where aj = From our choice of {fj)j=i...p„ 
given by (5.2), we have 

dTV (F/,„,,P/,„0 = / /'eH--'^^^l^-l--*^-^''^l^,,(v.)d^ 

^TT JcxC Jo 



dzidz2 
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We will use the smoothness of Gaussian densities to obtain a suitable 
upper bound. Call F the function defined on by 

where z = {xi + iyi, + iya) and 9^ • ip = (e^^^^, ^Je'^^f^). 

To control F, we adapt the proof of Proposition 3.3 of [BG13]. Only the 
sketch of the proof for this point is given here, please see [BG13] for the de- 
tails. We use a truncature for {xi,X2,yi,y2) S T^Rn •= B^2{0, Rn)'^ ■ Outside 
^R„) we use the key inequality (that comes from a Taylor expansion): 



(5.4) VfeeN VyG 



-3/ _ ^ (-y)' 



j=o 



Inside TLr^ we need to satisfy some constraints on the Fourier coefficients. 
Since here the only non null Fourier coefficients are of order 1 and p, we have 
finally to ensure that 

(5.5) Vm,/ < d V(s,s) G {-1;+!}^ Csm+sep{9j)e''^°'' = Csm+sipigi)- 
Hence, the maximum size of d is d = p/4. We have 

drv {^f,,gj,'^fugi) = 7r-2 / \F{xi,yi,X2,y2)\dxidyidx2dy2 

+ TT^ I \F{xi,yi,X2,y2)\dxidyidx2dy2 



[ {vi^yi' ) ^ (p/4)^ ' 

where the last point is deduced from inequality (5.4). We choose now i?„ 
such as Kn '■= 3-v/log n to obtain that e^^"^^ ^ (nlogn)"^ as required in 
condition (5.3). Now, we control the last term of the last inequality: the 
Stirling formula yields 

(p/4)P - 

If one chooses pn = Klogn with k > 12, we then obtain that 
dTV < e-^P-'°^P- < (nlogn)-2. 
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Such a choice of i?„ and p„ ensures that (5.3) is fulfilled. 

We have to make sure that our conditions (5.5) for the Fourier coefficients 
of the (7j's lead to valid densities. Take for instance, for some /3 > + 1/2, 



A 1 

A 



\ 1/2 / \ 1/2 ' 

Then take Co{gj) := 1, and V/c € Z*, c^igi) '■= a\k\~^ . This ensures that 

\cki9j)\ < 1, 

and therefore all gj remains nonnegative. 

Note that the densities gj fulfill the condition appearing in Theorem 2.3; the 
lower bounds below are also valid in this slightly smaller model. 

We then conclude our proof: we aim to apply the Fano Lemma (see Lemma 
5.1) with an = Pn^*~^ ^-iid l^n = ^(1) for the parametrization of (/j)j=i...p„- 
We then deduce the first lower bound 

liminf (logn)^'^'*'^ inf sup ||/ — /|P>c. 

"^+°° /eJ-. (/,g)GJ-.xOT.([0,l])(A) 

Our construction implies also that each gj are rather different each others 
since one has for instance, Cp{gj)e^"^ = Cp{gi) = Cp{gj')e^°'^' . Thus 

Vj / / \\g, - grWl > \cpi9j) - cp{gr)\' = p-^" - e"^' |' > cp-^-^. 

Applying the Fano Lemma to [gj)j^i we get 

liminf (log n)^''"''^ inf sup ||5 — S'lP > c. 

This ends the proof of the lower bound. □ 

6. Concluding remarks. In this paper, we exhibit a suitable prior 
which enables to obtain a contraction rate of the posterior distribution near 
the true underlying distribution Pjo ^o. 

Up to non restrictive condition, we can also obtain a large identifiability 
class to retrieve and g^. However in this class the contraction of the 
posterior is dramatically damaged since we then obtain a logarithm rate 
instead of a polynomial one. This last point cannot be so much improved 
using the standard distance to measure the neighbourhoods of as 
pointed by our frequentist lower bounds. Remark that we do not obtain 
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exactly the same rates for our lower and upper bounds of reconstruction. 
This may be due to the rough inequality |'i/'a('/5)| > ^ used to obtain 

(4.1). 

Indeed, the degradation of the contraction rate occurs when one tries 
to invert the identifiability map X : {f,g) i— )• IP/,^. This difficulty should be 
understood as a novel consequence of the impossibility to exactly recover the 
random shifts parameters when only n grows to +00. Such a phenomenon is 
highlighted in several papers such as [BGIO] or [BGKM12]. 

However, it may be possible to obtain a polynomial rate using a more 
appropriate distance adapted to our problem of randomly shifted curves 

dprechetifl, f2) ■= mf ||/i"^-/2||- 

tG[0,1] 

We plan to tackle this problem in a future work. The important requirement 
in this view is to find some relations between the neighbourhoods of P/o^^o 
and the neighbourhoods of /° according to the distance dprechet- 

APPENDIX A: SMALL BALL PROBABILITY FOR INTEGRATED 

BROWNIAN BRIDGE 

In the sequel, we still use the notation defined by (2.6) to refer to the 
probability distribution which is proportionnal to e" . We detail here how one 
can obtain a lower bound of the prior weight around any element . Since 
we deal with a log density model, it will be enough to find a lower bound of 
the weight around if one writes oc according to Lemma A.l (which 
is the Lemma 3.1 of [vdVvZOSa]). 

Lemma A.l ([vdVvZOSa]). For any real and measurable functions v and 
w of [0, 1], the Hellinger distance between pi, and pw is bounded by 

dH{Pv,Pw) < \\v -w\\^e^^"~'"^^°°/\ 

We now obtain a lower bound of the prior weight on the set previously 
defined as: 

:= {g G m,{[0,l]){2A) : dTv(.g,g'') < e} ■ 
This bound is given by the following Theorem. 

Theorem A.l. The prior Qif^A defined by (2.5) and (2.6) satisfies for c 
small enough: 

1 

Qu.A {Qe) > ce , 
where c is a constant which does not depend on e. 

Proof. The proof is divided in 4 steps. 
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Structure of the prior. We denote wq := logg , which is a A;,y-differentiable 
function of [0,1], that can be extended to a 1-periodic element of C'^'^(M). 
We define q the prior defined by (2.5) on such a class of periodic functions 
(and omit the dependence on u and A for sake of simplicity). The prior Qu^a 
is then derived from q through (2.6). We can remark that our situation looks 
similar to the one described in paragraph 4.1 of [vdVvZOSa] for integrated 
brownian motion. Indeed, the log-density wq should be approximated by 
some "Brownian bridge started at random" using 

1=1 

where i? is a real Brownian bridge between and 1. We suppose B built 
as Bf = Wt — tWi on the basis of a Brownian motion W on [0,1]. Of 
course, in the above equation, one can immediately check that Jk^{B){0) = 
Jk^{B){l) = 0. Moreover, the relation Jfc(/)' = Jk-i{f)- Jk~i{f) and an 
induction argument yields 

Vj G JkABpi^) = JkAB)^^\l). 

Hence, Jk^{B) and its first ky derivatives are 1-periodic. Of course, the func- 
tions are also 1-periodic and C°°(M) and thus our prior q generates ad- 
missible functions of [0, 1] to approximate wq. We will denote this set of ad- 
missible trajectories to refer to 1-periodic functions which are ky times 
differentiable. 

Transformation of the Brownian bridge. We denote Bi the separable Ba- 
nach space of Brownian bridge trajectories between and 1 and B2 = M'"'"^^"^. 
It is possible to check that the map 

T : (5, Zo, . . . , ZfcJ ^ Jfc, {B) + J2 Z^1Pi 

1=0 

is injective from the Banach space B = Bi x B2 to the set B := T{M). 
More precisely, an recursive argument shows that each map Jk{B) may be 
decomposed as 

k+l 

(A.l) Vt G [0, 1] Jk{B){t) = Ik{W){t) + ^ Ci,k{W)t\ 

1=1 

where Ci^k{W) are explicit linear functionals that depend on Wi and on the 
collection (/o^(l — t)''~^Wtdt^ .^j^ (and not on t), and 1^ is the operator 
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used in [vdVvZOSa] defined as = /q / and 1^ = h o Ik-i for k > 2. 

Hence, 

(A.2) VtG[0,l], T{B,Z^,...,ZkJ^\t) 

= Wt + ck,k{w)ki + ck+i,k{w){k + ly.t + Y,z,4''\t) 

1=0 

According to the Brownian bridge representation via its Karhunen-Loeve 
expansion (as sinus series), and since each ■ip^'^^ possesses a non vanishing 
cosinus term: 1 1— t- cos(27rzt), we then deduce that 

T{B\zl...,Zl) = TiB^Zl...,Zl) 

necessarily impHes that Z^ = Zf for i € {0, ... ,k^}, and next that = W'^ 
and = B'^. 

Thus, it is possible to apply Lemma 7.1 of [vdVvZOSb] to deduce that 
the Reproducing Kernel Hilbert Space (shortened as RKHS in the sequel) 
associated to the Gaussian process (2.5) in B is H := TH where IH is the 
RKHS derived in the simplest space B = Bi x B2. Moreover, the map T is 
an isometry from IH to H for the RKHS-norms. At last, since the sets Bi and 
B2 are independent, the RKHS EI may be described as 

H := I (/, z) e AC([0, 1]) X ]R'=-+^ : /(O) = /(I) = 0, /" < ooj , 

where ^C([0, 1]) is the set of absolutely continuous functions on [0, 1], H is 
endowed with the following inner product: 

((/l, ^1), (/2, z2))jj := f f[f^ + {z\z')^,^+,. 
Jo 

Extremal derivatives. We study the influence of the process 

6 := ^ Zitpi 

1=0 

and are looking for realizations of {Zi)i that suitably matches arbitrarily 
values Wq\o) = Wq\i). In this view, simple computations yield that for 
any integer p: 

and 

^(^P+^\t) = (-l)P(27rfc)2P+i[-sin(27rfct) +cos(27rfct)]. 
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Hence, the matching of 'Wq\o) by 6(-')(0) is quantified by 

k=0 

If one denotes '■= 2iTk, the vector of derivatives as do := (tL'o'^(0))j=o...fc„) 
Z = {Zq, . . . , Zk^^i) and the squared matrix of size (ky + 1) x (ki, + 1): 



^0 : = 



/ 


1 


1 






ai 


a2 














-a\ 


-al .. 






-af 


-a\ .. 


• ~"L 






4 


4 




a\ 






V 









then we are looking for values of Z such that do = ^o-^- The matrix is 
invertible since it may be linked with the Vandermonde matrix. 

We can now establish that the support of the prior (adherence of B) is 
exactly C\" . Indeed, the support of the transformed Brownian bridge Jk{B) 
is included in the set of 1-periodic functions C^" which possesses at the 
most /c + 1 constraints on the values of their k^ + 1 first derivatives at the 
point 0. These constraints are given by the coefficients {ci^k^)i=o...ki, hi (^-2). 
From the invertibility of the matrix Aq, it is possible to match any term 
Wq\o), < j < k,^ with the additional process b [see vdVvZOSb, section 10]. 

Small ball probability estimates. We now turn into the core of the proof of 
the Theorem. Since the Total Variation distance is bounded from above by 
the Hellinger distance, an immediate application of Lemma A.l shows that 
it is sufficient to find a lower bound of the q{Ge) where 

Ge ■■= {w G C^''([0, 1]) : ll^i; - wo\\oo < e} ■ 

Following the argument of [KLL94] on shifted Gaussian ball, we have 

log(q(ge))>- inf \\h\g-logq{\\Jk{B) + b\\oo<e). 

From the isometry T from EI to H, we can write that the approximation 
term iiiih&M:\\h~wo\\^<e II^IIh is of the same order as the approximation term 
that we can derive in H, and the arguments of Theorem 4.1 in [vdVvZOSa] 
can be applied here to get 

, ^„inf „ ^ l|/^lle<e"^. 
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It reminds to obtain a lower bound of the small ball probability of the 
centered Gaussian ball. Note that b and J^^ are independent Gaussian pro- 
cesses. We have somewhat trivially that log (^) < logP(||6||oo < e) • Thus, 
the main difficulty relies on the lower bound of 

Me) :=logP(||Jfc(i?)||oo <e). 

Going back to (A.l), we see that Jk{B) can be decomposed into two noninde- 
pendent Gaussian processes: Ik{W) and a polynomial Yli=i Ci,k{W)t^ which 
is a linear functional of Wi and of the collection ( /o^(l — ^)^~'^^t'^^) i<j<k' 
Therefore 

log 0) < logPdl Jfc(B) - hiW)\\^ < e) . 
Now, applying Theorems 3.4 and 3.7 of [LSOl] yields 

logP(||Jfe,(i3)|U <e)~logF(||4v(VF)||oo <e) > -e^^, 

which is of the same order as the approximation term. Gathering now our 
lower bound on shifted Gaussian ball and the term above ends the proof of 
the Theorem. □ 

APPENDIX B: EQUIVALENTS ON MODIFIED BESSEL FUNCTIONS 
Lemma B.l. For any n G Z and a > 0, define 

An{a) := / e'^^°"(")cos(nu)(i'u. 
Jo 

Then, the following equivalent holds: 

VaG[0,V^] A„(a)~^g)"(l + 0(^ 

Proof. This equivalent is related to the modified Bessel functions (see 
[AS64] for classical equivalents on Bessel functions and [LLIO] for standard 
results on continuous time random walks). More precisely, Im{o) is defined 
as 

^ — ^ 1 / O \ 2fc+m 

V,„.N,Va>0 /.„(„):= 2 . 

fc>0 ^ ' 

and we have (see for instance [AS64]) 



Jo (a) + 2 ^ Irn{a) cos(mn) = e""°''". 



m=l 
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Hence, we easily deduce that An{a) = 27r/n(a)- For small a, it is possible to 
use standard results on modified Bessel functions. Equation (9.7.7) of [AS64], 
p. 378. yields 

(B.l) VaG[0,V^] In{a) - ^ {^y {l + O 

□ 

This integral is strongly related to the density of continuous time random 
walk if one remark that if Bn{a) = e~"' An{a) / [2tt) , one has Bn{0) = 0, Vn 7^ 
and -Bo(O) = 1 and at last 

B^{a) Bn{a). 

Hence, Bn{a) is the probability of a continuous time random walk to be in 
place n G Z at time a. In this way, we get some asymptotic equivalents of 
Bn{a) (and so of A„(a)): from the Brownian approximation of the CTRW , 
we should suspect that for a large enough 

(B.2) Bn{a) ~ — ^e-'^'/(2a)^y^ ^ 

v2vra 

Moreover, from [AS64], we know that 

(B.3) In{(i) ~ , as soon as a > 2n, 

■v/27ra 

and this equivalent is sharp when a is large enough: from equation (9.7.1) p. 
377 of [AS64], we know that 

1 

Va > An^ In{a) > - x 



2 

We remark that (B.3) yields the heuristic equivalent suspected in (B.2): 
Bn{a) = e~°'In{a) ~ ^ , although (B.l) provides a quite different infor- 
mation for smaller a. We do not have purchase more investigation on this 
asymptotic since we will see that indeed, (B.l) is much more larger than 
(B.3). 

For a G [-^/n, 2n], we do not have found any satisfactory equivalent of 
modified Bessel functions. Formula of [AS64] is still tractable but yields 
some different equivalent which is not "uniform enough" since we need to 
integrate this equivalent for our bayesian analysis. This is not so important 
since we can see for our range of application that the most important weight 
belongs to the smaller values of a. 
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